Machine Translation Evaluation Metric for Text Alignment

نویسندگان

  • Prasha Shrestha
  • Suraj Maharjan
  • Thamar Solorio
چکیده

As plagiarisers become cleverer, plagiarism detection becomes harder. Plagiarisers will find new ways to obfuscate the plagiarized passages so that humans and automatic plagiarism detectors are not able to point them out. So, a plagiarism detection system needs to be robust enough to detect plagiarism, no matter what obfuscation techniques have been applied. Our system attempts to do the same by combining two different methods, one stricter to catch mildly obfuscated passages and one more lenient to catch difficult ones. We use a machine translation evaluation metric and n-gram matching to detect overlaps between the source and suspicious documents. On the PAN’14 corpus, which contains data with various types of obfuscation to mimic human plagiarisers, we obtained plagdet scores of 0.84404 and 0.86806 on the two datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HEVAL: Yet Another Human Evaluation Metric

Machine translation evaluation is a very important activity in machine translation development. Automatic evaluation metrics proposed in literature are inadequate as they require one or more human reference translations to compare them with output produced by machine translation. This does not always give accurate results as a text can have several different translations. Human evaluation metri...

متن کامل

Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems

This paper describes Meteor 1.3, our submission to the 2011 EMNLP Workshop on Statistical Machine Translation automatic evaluation metric tasks. New metric features include improved text normalization, higher-precision paraphrase matching, and discrimination between content and function words. We include Ranking and Adequacy versions of the metric shown to have high correlation with human judgm...

متن کامل

Using Tectogrammatical Alignment in Phrase-Based Machine Translation

In this paper, we describe an experiment whose goal is to improve the quality of machine translation. Phrase-based machine translation, which is the state-of-the-art in the field of statistical machine translation, learns its phrase tables from large parallel corpora, which have to be aligned on the word level. The most common word-alignment tool is GIZA++. It is very universal and language ind...

متن کامل

Assessing the Accuracy of Discourse Connective Translations: Validation of an Automatic Metric

Automatic metrics for the evaluation of machine translation (MT) compute scores that characterize globally certain aspects of MT quality such as adequacy and fluency. This paper introduces a reference-based metric that is focused on a particular class of function words, namely discourse connectives, of particular importance for text structuring, and rather challenging for MT. To measure the acc...

متن کامل

Document-Level Machine Translation Evaluation with Gist Consistency and Text Cohesion

Current Statistical Machine Translation (SMT) is significantly affected by Machine Translation (MT) evaluation metric. Nowadays the emergence of document-level MT research increases the demand for corresponding evaluation metric. This paper proposes two superior yet low-cost quantitative objective methods to enhance traditional MT metric by modeling document-level phenomena from the perspective...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014